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Abstract 

Recently we have introduced a novel characterisation of the distribution of twin primes that consists 
of three essential elements. These are: that the twins are most naturally viewed as a subsequence of the 
primes themselves, that the likelihood of a particular prime in sequence being the first element of a twin 
is akin to a fixed-probability random event, and that this probability varies with 7Ti, the count of primes 
up to this number, in a simple way. Our initial studies made use of two unproven assumptions: that it 
was consistent to model this fundamentally discrete system with a continuous probability density, and 
that the fact that an upper-bound cut-off for prime separations exists could be consistently ignored in the 
continuous analysis. The success of the model served as a posteriori justification for these assumptions. 
Here we perform the analysis using a discrete formalism - not passing to integrals - and explicitly 
include a self-consistently defined cut-off. In addition, we reformulate the model so as to minimise the 
input data needed. 
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1 Introduction 

In two recent papers, an empirical model for the distribution of twin primes was proposed Q and some 
of its predictions were developed ^ . The foundation of this novel approach to the distribution of twins is 
that the sequence of twins is most naturally studied in the context of the primes rather than the natural 
numbers. 

In [Q, empirical evidence was presented which strongly supports the contention that within the set of 
prime numbers less than or equal to some number TV, twins (pairs of primes with arithmetic difference 2) 
occur in the manner of fixed probability random events. This fact lies at the heart of the model. The prob- 
ability is not constant however, rather it decreases with increasing TV, in accord with the Hardy-Littlewood 
Conjecture. The third essential ingredient of the model is that the manner in which the probability changes 
can be expressed simply in terms of 7Ti(7V), the number of primes less than or equal to N. The reader 
is referred to |l], || for details. Extensions of the empirical analysis by ourselves and others || verify the 
persistence of the model up to N ~ 10 13 . 
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It was noted first by J. Calvo J4[, and also independently by M. Wolf ||, that in the course of developing 
the model for the distribution of twins we have taken an essentially discrete system of prime separations 
{i.e., the number of singleton primes which occur between a pair of neighbouring twins), and modelled it 
with a continuous distribution. Another possible shortcoming of the model developed in JjJ is that when 
we normalise the distribution we integrate over all (continuous) prime separations from (the most likely 
separation in the distributional model, see [^)) to oo. It is formally impossible to take this limit at any finite 
A, or even generally if the number of twins is infinite. We did so in our analysis because we believed that 
the error introduced was quite small. This was borne out by the apparent success of the model. 

It is the aim of this paper to address the two concerns: discrete analysis versus continuous, and taking 
into account the fact that for any A there exists a maximum prime separation. In the next section the 
model is briefly reviewed and reformulated. In the following section we shall reanalyse with sums rather 
than passing to the integral representation. Second, we shall self-consistently set an upper bound to the 
separations and incorporate its effects into our analysis. As a test of consistency, the predictions for upper 
bounds will be compared with the analysis of "prime gaps" in pi . 



2 The Model 

The model that we consider is empirical in that it is derived from a direct analysis of the distribution of 
twin primes less than 2 x 10 11 . The essential feature which provides the key to the success of the model is 
that the distribution of twins is considered in the context of the primes alone rather than within the natural 
numbers. The model is based upon the observation that twins less than some number A seem to occur as 
fixed-probability random events in the sequence of primes. That is, there is a characteristic distribution of 
prime separations which may expressed in the form 

V(s,n 1 )=Ae- ms . (1) 

Here, 7Ti is the number of primes less than or equal to A, s is the prime separation (the number of unpaired 
singleton primes occuring between two twins), and m is a decay parameter which is constant for a given A, 
but varies with 7Ti(A), while A(iri(N)) is an overall constant which is fixed by normalisation. V(s,iri) is 
the probability density that a given pair of twins in the sequence of primes up to A has prime separation s. 
When (|l|) is assumed to be continuous and extending to infinity, the condition that it be properly normalised, 
Io° ^( s ' 7Fl ) = 1: constrains A — m. 

We chose a representative sample of prime sequences and determined the decay constants for each. We 
began our analysis with (5 7), discarding the anomalous twin (3 5). The variation of the decay parameters 
- the slopes on a plot of log(frequency) versus separation - is well-described by the following function: 

i \ TO ° / n 

-m(7n) = -- — - — - , (2) 

log(TTl) 

where the constant, mo, has been estimated to equal 1.321 ± .008 in M. 



2.1 Reformulating the Model 

Our empirical model is founded upon a constructive procedure: from an exact knowledge of the distribution 
of separations, we determine the best-fit slope on a graph of log(frequency) vs. separation, giving each datum 
equal weight. The unfortunate aspect of this is that we are limited to the data that we have collected. In 
particular, various groups of researchers have counted primes and twins but they do not appear to have 
kept detailed counts of twin separations. Further, we now believe that we understand better the difficulties 
which led us to eschew characterisation of the behaviour of twins solely in terms of tti and ~ki . 

Let us now make a proper case for consideration of an empirical model for the distribution of twins whose 
inputs are the counts 7Ti(A) and ir2(N). Recalling ||], especially the formulae (7) and (15) and Figure 4, 
(reproduced here as Figure 1), in which the statistical average separation sq is expressed as the number of 
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singletons divided by the number of twins, and s is the reciprocal of the estimated slope, we write 

7T1 - 2772 , _ 1 

and s = — . 

7r 2 m 



so 



(3) 



It was readily apparent that the slopes m, determined as described above, and quantities 1/sq as in (g), 
closely correspond at large values of N. It is equally apparent that they differ significantly at smaller values. 
This is the region where the relatively strong enhancement of the few "large" separation events had the 
greatest effect in reducing the magnitude of the computed slopes. This in turn enabled the success of our 
simple and straightforward empirical model for the variation of the slope with tt\ . 
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Figure 1: Synthetic slopes (l/so) using Nicely's data marked with x's, computed slopes from the actual 
spectrum of prime separations with error bars, and our empirical fit. All vs. log(7Ti). This is Figure 4 in Q]. 

We set out to understand better the behaviour at the low end of the curve in Figure [l], by reconsideration 
of so as a function of log(7Ti). In Figure |^ the so derived from Nicely's data[] appear to follow very closely 
along a straight line with slope 0.7918 ± 0.0007 and y-intercept —1.194 ± 0.018. The negative value for 
the y-intercept, implying a positive value for the ^-intercept initially appeared to us to be pathological and 
prevented us from arriving at a simple empirical characterisation for the variation of the Nicely data. We 
now argue that this pathology is relatively benign, as the ^-intercept has such a small value, here ~ 1.5, 
that tti(N) ~ exp(1.5) is less than 5, and thus the value of N to which it corresponds is less than 20. We 
have no expectation that our statistical model can produce meaningful results for short sequences of primes, 
and so this value for the a;-intercept is truly and completely extraneous^. 

Thus, for the purposes of this paper, we have reformulated the fundamentally empirical model that we 
proposed in jj], || in such a way that we characterise the probability distribution for the twin separations not 
in terms of the "decay constant" m and its proper variation with N as before, but instead in terms of the 
"mean separation" s and its concommitant proper variation. It is most likely the case that this reformulation 

Cicely's data Q consist of values of N, wi(N), and iT2(N). We have adjusted the 7Ti to discard the singleton primes which 
appear after the last twin less than N. 

2 See the Conclusion for further comments about the accuracy of our linear fit and the values of the intercepts. 
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Figure 2: Statistical average prime separations, sq, using Nicely's data and our linear fit vs. log(7Ti). 



is best suited for further development as it relies exclusively on data obtained solely by counting primes and 
twins. Henceforth we will rewrite (Q) as 



V(s,ir 1 ) = Ae- s / s . 
Our empirical model (Figure |J) strongly suggests that sq varies with iri as 

S (7Ti) = Si log(7Ti) + S a , 



(4) 



(5) 



and we've written Si and So for the constants whose values are empirically determined to be 0.7918±0.0007, 
and — 1.194 ± 0.018 respectively. We further insist that so > as discussed above. The relation that exists 
between s and Sq is revealed in the next section. 



3 Discrete Analysis 

Normalising the probability distribution for the occurrence of prime separations yields 

L L 

l = 5>(^i) = I>^ S/5 ' ( 6 ) 

s=0 s=0 

where we have interpreted the V(s,tti) as relative frequencies rather than absolute counts (in which case 
the Ihs of (|^) would equal 7T2 — 2, the total number of twins less two^] and have inserted as an Ansatz the 
empirical relation (^). L denotes the maximum prime separation, thereby truncating the sum. A priori L 
is not specified and a value must be assumed or self-consistently derived. 

3 We disregard the twin (3 5) and the prime separations are intervals between neighbouring twins. 
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A second relation among A, s and L is formed by consideration of the frequency-weighted average prime 
separation, viz. 



so 



7Tl - 2-7T2 
7T2 



S,7Tlj 



s=0 



s=0 



(7) 



In fact, a note of caution is required here. Our empirical analysis discards the first twin and considers 
separations, so "^2" should be replaced by TT2 — 2, and to be consistent "711" should be ~k\ — 2 since we pass 
over the primes 2 and 3. Incorporating these minor offsets into (Q) results in the (ni — 2-K2 + 2)/ (7:2 — 2) 
which is only slightly different from the simpler and more straightforward expression that we use. 

A third relation among the parameters comes from assigning a cut-off for the probability distribution 
for prime separations. The Ansatz (||) has no such cut-off built into it, although one might try to use the 
"scale" set by s to establish one by fiat (say, L — 20 x s). Instead, we shall adopt the general method 
utilised in and set a minimum probability threshold with the introduction of a so-called risk factor /. 
In effect / prime separation events with prime separations greater than L are be expected to occur in the 
context of the probabilistic model. Then we may write 



p 00 

z s=L+l 



(8) 



providing a self-consistent cut-off value for the sum over separations. 

Thus we have a set of three equations with three known quantities, the counts of primes and twins, 7Ti 
and 7r 2 respectively, for a given N and the risk factor /, and three parameters to be determined: A, s, and 
L. We now proceed to solve these equations in two distinct instances: the first in which we formally set 
/ = in which case L = 00, and the second, where / is specified (non-zero). 



3.1 With / = and L = 00 

With / = and L = 00 equation (P) has no content and (^|) and (^) can be solved exactly. The solutions 



are 



and 



= log 



1 + 



■so 



.4 : 



1 



SO 



(9) 



(10) 



We note three things about this solution. The first is that s and A depend on ttj and 7r 2 and hence N 
implicitly via So- The second is that with increasing N the density of twins decreases, Sq increases, and 
s ~ so as is easily seen by expanding (^) in this limit. Third, we remark that M. Wolf obtains a similar 
result H, except that he assumes a priori that l/s << 1 in order to simplify his analysis whereas our result 
is exact. 



3.2 The General Case 

Formally performing the summations and rearranging, (j^), (^), and (||) may be cast into the following useful 
forms (letting q denote exp(— l/s)): 

/ A 
1 + — 



7r 2 1 - q 

^ (T f 

so = [L + 1) — . 

l-q tt 2 
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These may be solved numerically for the set of parameters: A, s and L once the inputs so(ni, 772), 7T2, and 
/ are specified. 

Instead, to get an idea of the general behaviour, we make a well-motivated approximation to the final 
equation listed. In the regime described by the model, / << 7T2, for reasonable values of /, and we expect 
that the factor of L + 1 is insufficient to render the second term on the rhs appreciable. Put another way, 
the average value of the separation is rather insensitive to the precise value of the maximum separation 
because relatively few separation events are maximal or near maximal. Writing 

— i^i- (12) 

we immediately see that the solution for s is exactly the same as above (|^) . While this is an approximation, 
it is also a consequence of the relative insensitivity to the cut-off as was strongly suggested by the success of 
the continuous analysis in Q. With this result in hand, it is possible to solve for A and L without further 
approximations. The normalisation constant 

A=(l + —) (13) 
V 71-2/ 1 + so 

is shifted slightly greater than its value derived in the case of no cut-off, while 

log (l + f ] 



L = -l 



log 



1 + i 



so 



(14) 



describes the growth in the cutoff in terms of the prime counts, statistical average separation, and risk 
factor. 

In Figure [| we have replotted the actual thresholds obtained in our analysis of the likely maximal 
separations as performed in Furthermore, we now include values of L predicted by Jl4| ) using Nicely's 
counts of primes and twins as inputs, and choosing the risk factor to be equal to 1. 

In the region in which there is overlap between the observed thresholds, our earlier computed estimates 
for the maximum separation at given N, and the results obtained by direct computation using (|lj) the 
agreement is exceptional. The spectacular agreement between the predicted cut-offs and the maximal gap 
predictions of persists above the range over which we have data. While this was not unexpected, it 
provides further reassurance of the consistency of our model for the distribution of twins. 



4 Conclusion 

Very careful examination of the curve in Figure |^ reveals that the data exhibit a slight tendency suggestive 
of negative curvature. Naive theoretical considerations suggest attempting a three-parameter fit of the form 

s (tti) = S 2 log(log(7ri)) + Si k>g(7Ti) + S , (15) 

Fitting to the same data as before the empirical values obtained for the constants So, Si, and £2 are 

So = -3.55 ± 0.07 , Si = 0.745 ± 0.001 , and S 2 = 1.10 ± 0.03 . 

From a practical perspective, it will require considerable effort to extend the data into a regime in which 
the three parameter fit is clearly distinguished from the linear approximation. This is just a manifestation 
of the extremely slow growth of the function log 2 (a;) = log(log(x)). We also note that the pathology of 
negative intercepts reappears with increased strength. Again, however, we can claim that this pathology 
is benign since it implies a lower limit wi(No) ~ exp(4.5) ~ 100, leading to an estimate that No is on the 
order of 500, which is also far below the regime in which our statistical model is applicable. Furthermore, 
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Figure 3: Maximal expected Prime Separation vs. log(N). The x's in the lower left mark the actual onsets 
of successive maxima in the gap spectrum. The dotted curve is the prediction for / = 1 from the analysis 
in Q). The +'s denote the computed values of L from (|l4|). 



the precise values obtained for Sq and 1S2 were rather sensitive to the range of data over which the fit was 
performed which leads us to believe that safer estimates for these coefficients and their errors are 

So = -3.3 ± 0.4 , Si = 0.75 ± 0.01 , and S 2 = 1.0 ± 0.2 . 

In this paper, we have strengthened the case for, and extended the utility of, our characterisation of the 
the distribution of twin primes as random fixed-probability "events" among the primes. We have done this 
by first performing the analysis of the model without passing to the continuous (integral) limit and have 
demonstrated that there are no obstacles. Furthermore we have made new and self-consistent predictions 
for the occurrence of "gaps" (maximum separations) and these are seen to conform well to the actual data 
and to our previous model analysis. Perhaps the most important result going forward is our successful 
reformulation of the model which has enabled it to accept as inputs the raw counts of primes and twins 
below a certain N, along with a risk-factor (of order 1) to which the model is fairly insensitive. 

Lastly, we note that, particularly in this reformulated form without accounting for the cutoff, the empir- 
ical model is predictive since the factors which enter into the Ansatz (0), A and S, are determined from s 
alone whose behaviour is captured by (|B|) . Given additional knowledge of tt2 (N) we can choose a risk- factor 
and determine a more precise prediction for the spectrum including maximal expected prime separation. 
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